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ABSTRACT 


In  this  paper  we  obtain  an  asymptotic  lower  bound  for  the  entropy  of 
a  multinomial  population  with  an  unknown  and  perhaps  countably  infinite 
number  of  classes.  This  bound  is  a  function  of  the  first  k  +  1  occupancy 
numbers  of  a  random  sample,  and  is  a  useful  estimator  when  most  of  the 
sample  information  is  contained  in  the  low  order  occupancy  numbers. 


AN  ASYMPTOTIC  LOWSR  BOUND  FOR  THE 
ENTROPY  OF  DISCRETE  POPULATIONS  WITH  APPLICATION  TO 
THE  LSTIMATTON  OF  ENTROPY  FOR  UNIFORM  POPULATIONS 

L.  B.  Cobb  atid  Bernard  Hains 


Introdui  t;o:i  arid  Summary.  As^^uiik  ’hat  a  random  sample  of  size 
N  bi  <  !i  draw.!  irom  ^  mult inoniial  population  with  ars  unknown  and 

pt/iivaps  countably  intiuit.-  number  of  cla  S I3 s .  That  IS,  if  is  the  jth 
obs’ Tvation.  and  M  tIv  ith  class,  thv/n 
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In  addition,  in  Harris  [1]  ,  it  is  shown  that  the  moments  of  F  (x)  , 
M  . are  approximately  given  by 

1  4b 
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(r+I)?E(  n^^^) 

EUjl 


U  we  then  replace  the  expected  values  in  (4)  by  the  observed  values, 
defining 

( rfi)  !  n 


m  = 
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r+1 


1 


estimates  of  the  moments  of  F  (x)  are  obtained.  Then,  let 

*»  b]  bg  jjjg  Qf  cumulative  distribution  functions  with 


F<a-0)  s  0,  F<b)  =  1,  and 
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J  x^dF(x)  =  m  ,  j  =  1,2 . k  . 
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Since  p  »  p  ,  •  • .  are  ail  assumed  to  be  unknown,  F  (x)  is  unknown, 

1  M 

and  an  asymptotic  lower  bound  to  (3)  may  be  found  by  minimizing 

oo 

/  •’^k)gt~)dF(x) 

ovwr  the  set 
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Vl  0,  N]  This  process  uses  only  the  information 

'  r®Z*  *”  * 

cxhtfeined  in  the  first  k  4  1  occupancy  numbers  n^,  n^,  is 

peitlculeriy  useful,  vdien  the  sample  Information  concerning  the  par- -  l 
p^,  p^,  Is  ccmcentroted  in  the  low  order  occupancy  numbers.  This  occurs, fc^ 
,  If  os  N  Pj  **  0 ,  J  *  1,  2, ,  in  siwh  a  way  that  0  <  Np^  <  k , 
wiMNns  V  It  appraal«MMely  k  4  i  . 
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The  mlmimum  Is  explicitly  computed  for  k  -  The  process  employed 
here  is  compared  with  the  maximum  likelihood  estimates  of  entropy  for 
uniform  populations  with  p  =  —  ,  )  -  1,  ,  M  and  M  *  as  N 

j  M 

so  that  N/M  ^  k  >0. 


2.  The  computation  of  the  lower  bound  for  entropy.  In  Harris  [IJ,  it 
was  shown  that  for  r^  =  0(N)  as  N 
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where  the  approximation  is  valid,  in  the  sense  that,  either  both  sides  are 
negligible,  or  the  ratio  of  the  two  sides  approaches  unity. 

In  particular. 
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Let  h(x)  =  log  —  .  Then  we  wish  to  determine  F^Cx)  t  Nj  .  such 
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Jilnce  h(0)  does  not  exist,  we  consider  instead  >  where  <  >  0,  Is 

I  ^  j*  ^ 

arbitrary.  Then  h{x)  Is  bounded  on  («,N]  for  every  <  >  0  and  It  is  well- 
known  [  1  ]  that  F^(x)  defined  by 


Aiiaii 

( m,,  m 


r  h(x)clF(x)  =  f  h<x)dF  (x)  , 


r  2' 


is  obtainable  as  a  discrete  curnulative  distribution  function  with  at  most  three 
Jumps,  say  at  x^,  x^,  x^,  *  £  <  x^  <  x^  <  N  .  Hence,  there  exists 

with 


such  that 


Vl  ^  =  "l 


S*!  *‘'2’‘2*S=‘3  ""2  ’ 


(10)  r  (xi « <  • 


0  ,  X  <  X, 


kj  ,  Xj<x<X2 


X^+X^,  X2<x<X3 

-  1  ,  X  >  X3 


wheneiwr  >  bBj  ,  a  condition  which  we  will  assume  throughout  the  remainder 

2 

df  this  discussion.  With  no  ioss  in  generality,  we  may  assume  that  , 

since  cH^erwlse  F^tx)  is  a  cumulative  distribution  function  with  exactly  one 
Mid  (8)  has  •  trivial  solutiUin. 


11  CMS  ba  stown  that  X^  >0  ,  i  ^  1,  2,  3,  if  and  only  if 
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(11)  (-1)  (x^x^  -  +x. ) >  0  ,  * 

In  addition,  from  Harris  fl],  there  exist  real  numbers  such 

that  are  roots  of 

(12)  q(x)  =  -  h(x)  =  0, 

and 

(13)  o.x^  -  h(x)  <  0  ,  €  <  X  <  N  . 

1=0  i  — 

From  (11)  and  (12),  we  also  have  that  for  €  <  <  N ,  i  =  1,  2,  3  ; 

(14)  g’(Xj)  =  Oj  +  2o^x^  -  h'(x^)  =  0  , 

To  solve  (9),  (12),  (13)  and  (14)  ,  observe  that  there  exist  numbers 
6,  6,6  ,  0<6<6  <6  <N,  such  that 

X  b  J  i  ^  V 


and 


with 


h*(x) 


h"(x) 


N  -*■  « 
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N  -  00 


and  h"(x)  is  strictly  decreasing  on  (0,  6^)  and  (6^,  N).  We  now  establish 


the  following 


Lemma.  If  €<Xj^<x^<N  {0<€<6j),  the  following  conditions  cannot 


be  satisfied  simultaneously 


i.  1 

^1=0 '"l’'  -  ’ 


«  <  X  <  N 


Proof.  Assume  (15)  and  (16)  hold.  Let  p(x)  =  ‘ 


n7)  h'(x^)  -  p'(x^)  ,  )  =  I,  2  . 

Let  =  («  »  6j]  ,  ^2  “  ^  ^2^ '  ^3  ~  ^  ^2’  “2  ^  ^  *  ^3 

since  p(x)  is  strictly  convex  and  h(x)  is  strictly  concave  in  by  (16) 

and  (17),  we  have  p(xQ)>h(xQ)  for  some  x^  €  contradicting  ( IS) .  If 
«  I.,,  then  p'(x^)  >  0,  hence  p(N)  >  p(x^)  >  0  =  h(N)  ,  contradicting  (15). 
If  €  Ij,  then  c  <  <  x^  <  6^,  and  by  (16)  and  Rolle's  Theorem,  there 

exist  ’'j  ^  ^  ^2  ^  ^^2  such  that  g'‘(4^)  =0,  j  =1,2.  This, 

towever,  implies  that  h"(4^)  =  20^,  j  =1,2,  contradicting  the  monotonicity 
of  h’'(x). 

If  »  <  0,  the  argument  is  similar.  The  case  o,  =  0  is  trivial. 

We  now  obtain  Fq(x)  . 


Theorem  L  There  exists  a  unique  cumulative  distribution  function 
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such  that 
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Proof.  By  the  above  lemma,  we  have  -  <  ,  «  <  x  <  N  x  ~ 
From  { 11) ,  we  have 


(19) 


^*^1  '  ft'.  ^  '  m  < 
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N  -  m 
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Thus,  by  (9),  we  have 


Nx^  -  m,{  N  +x^)  +  m  , 
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This  gives  a  parametric  family  of  cumulative  distribution  functio.ns  F  (x) 

0,  X  '  ’ 


Since  lim  h(x)  =  «»  ,  we  must  have  X  (x,,« )  =  O(^)  ,  t  -  o.  since 

X-*0+  12  hlf)  '  ’  3*iice 


h(.) 


Otherwise  F^(x)  would  not  satisfy  (8).  Hence  ilm  X,(x  ,*)  =0  and 

t  -*0  ^  ^ 
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Nm.  - 

X ,  —r; -  as  f  0  .  Since  X  ( x  c  )  h(  c )  >0  for  every  €  >  0  ,  it 

z  N  -  i  z  — 

follows  that  X,<x.,*l  *  of  .  , )  as  «  -*  0,  establishing  the  theorem, 

i  z  ni «  ) 

Finally  we  have: 

Theorem  2.  The  required  lower  bound  for  the  entropy  is 


.ji;  h(x)cir^(x)  = 
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log 
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Remark.  Kreln  [2]  has  studied  minimization  problems  similar  to  (8). 
However,  Krein's  methods  require  that  1,  x,  h{x)  form  a  Tschebycheffian 
system  of  functions  on  [« ,  N].  A  necessary  condition  for  the  above  (  see 
P61ya  and  Szego  (  3] )  is  that  the  Wronskians 


W(x) 


1  X  h(x) 

0  1  2x  h'(x) 

0  0  2  h''(x) 

0  0  0  h-fxi 


e  <  X  <  N  , 


be  non>negative  (non -positive)  on  [c,  N].  This  condi.tion  is  clearly  not 
satisfied  in  this  case  and  Krein's  methods  are  therefore  inapplicable. 
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3.  The  Estimation  of  the  Entropy  of  Uniform  Populations  .  Let 


)  =  1,  2,...,M 
Otherwise 


j 

«S16  i 
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if 


and  for  M  =  1000,  N  =  100,  we  have  E(n  )  =  90.48,  S‘(n  )  =4.  5^,  E(n  )=.15 

•  2  3 

obtaining 

E(I^)  =  4.271 

and  log  M  =  6.  908. 

Example.  Three  random  samples  were  chosen  with  N  =  1000,  M  =  1000. 
The  data  are  summarized  below. 


Sample  #1  Sample  #2  Sample  #3 
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H(pj,  •••  *  1^) 

6.  908 

6.  908 
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fl 

6.364 

6.294 
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Z  4 

In  aample  #1 ,  m  <  m,  ,  then  supposing  F.  ,  to  be  degenerate  with  a 
^  *  2  ni 

Jump  erf  1  at  mj,  we  get,  using  m,  =  m^  ,  -j|-J  Wx)dF^(x)  *7.419. 
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